• Linear Regression: Standard linear regression models a relationship between a dependent variable (y) and an independent variable (x) as a straight line:

y = β₀ + β₁x

Where:

β₀ is the intercept.

β₁ is the slope.

  • Introducing the Quadratic Term: Quadratic regression extends linear regression by adding a squared term of the independent variable (x²):

y = β₀ + β₁x + β₂x²

Where:

β₂ is the coefficient of the squared term.

The Curve:

The x² term introduces a curve into the relationship.

If β₂ is positive, the curve opens upward (like a U).

If β₂ is negative, the curve opens downward (like an inverted U).

1 Sheet 1

1.1 What is the relationship between population and IGF revenue performance patterns?

# Descriptive statistics
Cleaned_TaMA_Data %>% skim(Population)
Data summary
Name Piped data
Number of rows 11
Number of columns 76
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Population 0 1 552181.8 97862.98 411000 475500 549000 627000 701000 ▇▅▅▅▅
Cleaned_TaMA_Data %>% skim(IGF)
Data summary
Name Piped data
Number of rows 11
Number of columns 76
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
IGF 0 1 1799544 679366.3 945774.9 1348552 1608987 2213410 3215765 ▇▇▅▅▂
# Histograms
ggplot(Cleaned_TaMA_Data, aes(x = Population)) +
  geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
  labs(title = "Distribution of Population", x = "Population", y = "Frequency") +
  scale_x_continuous(labels = comma)

ggplot(Cleaned_TaMA_Data, aes(x = IGF)) +
  geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
  labs(title = "Distribution of IGF Revenue", x = "IGF Revenue", y = "Frequency") +
  scale_x_continuous(labels = comma)

# Growth Rate (Percentage)
Cleaned_TaMA_Data <- Cleaned_TaMA_Data %>%
  mutate(
    Population_Growth_Rate = c(NA, diff(Population) / Population[-length(Population)] * 100),
    IGF_Growth_Rate = c(NA, diff(IGF) / IGF[-length(IGF)] * 100)
  )

# Plot of Trends


ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population)) +
  geom_point(aes(y = Population), color = "dodgerblue") +
  labs(title = "Population Trend", x = "Year", y = "Population") +
  scale_y_continuous(labels = comma)

ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
  geom_line(aes(y = IGF)) +
  geom_point(aes(y = IGF), color = "dodgerblue") +
  labs(title = "IGF Trend", x = "Year", y = "IGF") +
  scale_y_continuous(labels = comma)

ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population, color = "Population")) +
  geom_point(aes(y = Population, color = "Population")) +
  geom_line(aes(y = IGF, color = "IGF")) +
    geom_point(aes(y = IGF, color = "IGF")) +
  labs(title = "Population vs. IGF Revenue", x = "Year", y = "Amount/Population", color = "Type") +
  scale_y_continuous(labels = comma)

# Growth rate plots
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population_Growth_Rate, color = "Population Growth")) +
    geom_point(aes(y = Population_Growth_Rate, color = "Population Growth")) +
  geom_line(aes(y = IGF_Growth_Rate, color = "IGF Growth")) +
    geom_point(aes(y = IGF_Growth_Rate, color = "IGF Growth")) +
  labs(title = "Population Growth vs. IGF Growth", x = "Year", y = "Growth Rate (%)", color = "Type") +
  scale_y_continuous(labels = percent_format(scale = 1)) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") # Add horizontal line at zero

The histograms show an uneven distribution of population and IGF revenue. The population had the highest around 450,000. The trends plots show clear that the trend of IGF Revenue ( which experienced significant changes) is in the same direction to the trend of Population( which stable rise).

1.1.1 Regression Analysis

mod1 <- lm(IGF ~ Population, data = Cleaned_TaMA_Data)
summary(mod1)
## 
## Call:
## lm(formula = IGF ~ Population, data = Cleaned_TaMA_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -280063 -142033   -5919  109601  433935 
## 
## Coefficients:
##                  Estimate    Std. Error t value   Pr(>|t|)    
## (Intercept) -1845174.9832   401403.2226  -4.597     0.0013 ** 
## Population         6.6006        0.7168   9.209 0.00000708 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 221800 on 9 degrees of freedom
## Multiple R-squared:  0.9041, Adjusted R-squared:  0.8934 
## F-statistic:  84.8 on 1 and 9 DF,  p-value: 0.000007076
Cleaned_TaMA_Data %>%
  ggplot(aes(x = Population, y = IGF)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) + 
  labs(x = "Population", y = "IGF Revenue (Ghana Cedis)", title = "Linear Relationship between Population and IGF Revenue") + 
  scale_y_continuous(labels = scales::comma)

# The Quadratic Term
Cleaned_TaMA_Data$Population_Squared <- Cleaned_TaMA_Data$Population^2

#  Quadratic Regression
mod_quad <- lm(IGF ~ Population + Population_Squared, data = Cleaned_TaMA_Data)

summary(mod_quad)
## 
## Call:
## lm(formula = IGF ~ Population + Population_Squared, data = Cleaned_TaMA_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -325625  -76598  -40625  129711  307495 
## 
## Coefficients:
##                             Estimate        Std. Error t value Pr(>|t|)
## (Intercept)        1172389.975557798 2698329.843956656   0.434    0.675
## Population              -4.586448127       9.920722573  -0.462    0.656
## Population_Squared       0.000010075       0.000008912   1.131    0.291
## 
## Residual standard error: 218500 on 8 degrees of freedom
## Multiple R-squared:  0.9173, Adjusted R-squared:  0.8966 
## F-statistic: 44.35 on 2 and 8 DF,  p-value: 0.00004685
ggplot(Cleaned_TaMA_Data, aes(x = Population, y = IGF)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) + # Use formula for quadratic
  labs(x = "Population", y = "IGF Revenue (Ghana Cedis)", title = "Quadratic Relationship between Population and IGF Revenue") +
  scale_y_continuous(labels = comma)

Linear Regression:

Coefficients:

Intercept: -1845174.9832

Population: 6.6006 . For each unit increase in population, IGF is predicted to increase by approximately 6.60 Ghana Cedis.

P-values: Intercept: 0.0013 (significant)

Population: 0.00000708 (significant)

R-squared: Multiple R-squared: 0.9041

Adjusted R-squared:0.8934

Interpretation: The linear model shows a very strong and statistically significant relationship between population and IGF revenue.
Population explains as high as 90.41% of the variance in IGF.

Quadratic Regression:

Coefficients: Intercept: 1172389.975557

Population: -4.586448127

Population_Squared: 0.000010075

P-values: All coefficients are statistically insignificant (p > 0.01). But the overall model is statistically significant ( p-value = 0.00004685).

R-squared: Multiple R-squared: 0.9173

Adjusted R-squared: 0.8966

Interpretation: The quadratic model shows a strong and statistically significant relationship between population and IGF revenue. The insignificant quadratic terms confirm that the relationship is linear and not non-linear relationship.

The R-squared of 0.9173 indicates that the quadratic model explains 91.73% of the variance in IGF, a little improvement of the linear model but since it is linear the coefficients p-values are non-significant.

  • Based on the statistical significance of the coefficients, the linear model is preferable.

  • Transformations

# Transformed Model
lm(Ln_IGF ~ Ln_Pop, data = Cleaned_TaMA_Data) %>% summary()
## 
## Call:
## lm(formula = Ln_IGF ~ Ln_Pop, data = Cleaned_TaMA_Data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.115868 -0.058994 -0.004073  0.063445  0.126499 
## 
## Coefficients:
##             Estimate Std. Error t value    Pr(>|t|)    
## (Intercept)  -12.704      2.140  -5.936    0.000219 ***
## Ln_Pop         2.048      0.162  12.638 0.000000495 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09207 on 9 degrees of freedom
## Multiple R-squared:  0.9467, Adjusted R-squared:  0.9407 
## F-statistic: 159.7 on 1 and 9 DF,  p-value: 0.0000004948
# Scatter Plots (Transformed Data)
ggplot(Cleaned_TaMA_Data, aes(x = Ln_Pop, y = Ln_IGF)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(title = "Log(Population) vs. Log(IGF Revenue)", x = "Log(Population)", y = "Log(IGF Revenue)")

After the log transformation the log model showed a stronger improvement of the linear relationship than the simple linear model and the relationship now strongly significant (p-value: 0.0000004948 and R-squared: 0.9467 ). The log model provides the best fit among the models so far.

  • Checking Regression Assumptions
# Scatter Plot

ggplot(Cleaned_TaMA_Data, aes(x = Population, y = IGF)) +
  geom_point() +
  labs(title = "Population vs. IGF Revenue", x = "Population", y = "IGF Revenue")

# Residual
ggplot(data = data.frame(residuals = residuals(mod1), fitted = fitted(mod1)), aes(x = fitted, y = residuals)) +
  geom_point() + # Added geom_point()
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs. Fitted (Linear) ", x = "Fitted Values", y = "Residuals")

ggplot(data = data.frame(residuals = residuals(mod1)), aes(x = residuals)) +
  geom_histogram(bins = 10, fill = "skyblue", color = "black") +
  labs(title = "Histogram of Residuals(Linear)", x = "Residuals")

ggplot(data = data.frame(residuals = residuals(mod1)), aes(sample = residuals)) +
  geom_point(stat = "qq") +
  stat_qq_line() +
  labs(title = "Q-Q Plot of Residuals")

#  Residuals vs. Fitted Values
ggplot(data = data.frame(residuals = residuals(mod_quad), fitted = fitted(mod_quad)), 
       aes(x = fitted, y = residuals)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs. Fitted (Quadratic Model)", x = "Fitted Values", y = "Residuals")

#  Histogram of Residuals
ggplot(data = data.frame(residuals = residuals(mod_quad)), aes(x = residuals)) +
  geom_histogram(bins = 10, fill = "skyblue", color = "black") +
  labs(title = "Histogram of Residuals (Quadratic Model)", x = "Residuals")

#  Q-Q Plot of Residuals
ggplot(data = data.frame(residuals = residuals(mod_quad)), aes(sample = residuals)) +
  geom_point(stat = "qq") +
  stat_qq_line() +
  labs(title = "Q-Q Plot of Residuals (Quadratic Model)")

#  Durbin-Watson Test (Autocorrelation)
dwtest(mod1)
## 
##  Durbin-Watson test
## 
## data:  mod1
## DW = 1.9465, p-value = 0.3137
## alternative hypothesis: true autocorrelation is greater than 0
dwtest(mod_quad)
## 
##  Durbin-Watson test
## 
## data:  mod_quad
## DW = 2.0658, p-value = 0.233
## alternative hypothesis: true autocorrelation is greater than 0
#  Breusch-Pagan Test (Homoscedasticity)
bptest(mod1)
## 
##  studentized Breusch-Pagan test
## 
## data:  mod1
## BP = 6.7204, df = 1, p-value = 0.009532
bptest(mod_quad)
## 
##  studentized Breusch-Pagan test
## 
## data:  mod_quad
## BP = 9.3229, df = 2, p-value = 0.009453
#  Variance Inflation Factor (VIF) - Multicollinearity
bptest(mod1)
## 
##  studentized Breusch-Pagan test
## 
## data:  mod1
## BP = 6.7204, df = 1, p-value = 0.009532
vif(mod_quad)
##         Population Population_Squared 
##           197.4834           197.4834

For the linear model all the assumptions are met for except Homoscedasticity but for the quadratic model the Homoscedasticity and Multicollinearity assumptions are not satisfied.

  • Diagnostics

To address heteroscedasticity of the linear model we use Robust Standard Errors

 #1. 

# Robust standard errors using HC3 (a common method)
robust_se <- coeftest(mod1, vcov. = vcovHC(mod1, type = "HC3"))
print(robust_se)
## 
## t test of coefficients:
## 
##                  Estimate    Std. Error t value  Pr(>|t|)    
## (Intercept) -1845174.9832   574978.9583 -3.2091 0.0106740 *  
## Population         6.6006        1.1645  5.6680 0.0003065 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#  Diagnostic Plots

# Residuals vs. Fitted
ggplot(data = data.frame(residuals = residuals(mod1), fitted = fitted(mod1)),
       aes(x = fitted, y = residuals)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs. Fitted (Linear Model)", x = "Fitted Values", y = "Residuals")

cat("
## Linear Model with Robust Standard Errors

", capture.output(print(robust_se)), "

The Breusch-Pagan test indicated heteroscedasticity (p < 0.05). To address this, robust standard errors (HC3) were used. The robust standard errors adjust for the non-constant variance of the residuals to provide more reliable estimates of the coefficients' standard errors and p-values.

The linear model shows a highly significant relationship between Population and IGF revenue, even when using robust standard errors. The R-squared value is ", summary(mod1)$r.squared,", indicating that ", round(summary(mod1)$r.squared * 100, 2), "% of the variance in IGF is explained by Population.
")
## 
## ## Linear Model with Robust Standard Errors
## 
##   t test of coefficients:                   Estimate    Std. Error t value  Pr(>|t|)     (Intercept) -1845174.9832   574978.9583 -3.2091 0.0106740 *   Population         6.6006        1.1645  5.6680 0.0003065 *** --- Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  
## 
## The Breusch-Pagan test indicated heteroscedasticity (p < 0.05). To address this, robust standard errors (HC3) were used. The robust standard errors adjust for the non-constant variance of the residuals to provide more reliable estimates of the coefficients' standard errors and p-values.
## 
## The linear model shows a highly significant relationship between Population and IGF revenue, even when using robust standard errors. The R-squared value is  0.9040508 , indicating that  90.41 % of the variance in IGF is explained by Population.

Therefore from the analysis so far we found a strong and statistically significant positive linear relationship between population and IGF revenue. Population growth is a strong indicator of increased IGF revenue performace pattern. The assumptions are met.

1.2 What is the relationship between population and DACF revenue performance patterns?

Cleaned_TaMA_Data %>% skim(Population)
Data summary
Name Piped data
Number of rows 11
Number of columns 79
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Population 0 1 552181.8 97862.98 411000 475500 549000 627000 701000 ▇▅▅▅▅
Cleaned_TaMA_Data %>% skim(DACF)
Data summary
Name Piped data
Number of rows 11
Number of columns 79
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
DACF 0 1 2939617 1388239 802346.2 1971853 3299111 3928373 4833171 ▇▁▅▇▇
# Histograms
ggplot(Cleaned_TaMA_Data, aes(x = Population)) +
  geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
  labs(title = "Distribution of Population", x = "Population")

ggplot(Cleaned_TaMA_Data, aes(x = DACF)) +
  geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
  labs(title = "Distribution of DACF Revenue", x = "DACF Revenue")

#Growth Rates and Per Capita Values
Cleaned_TaMA_Data <- Cleaned_TaMA_Data %>%
  mutate(
    Population_Growth_Rate = c(NA, diff(Population) / Population[-length(Population)] * 100),
    DACF_Growth_Rate = c(NA, diff(DACF) / DACF[-length(DACF)] * 100)
  )




# Plotting Trends

ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population)) +
  geom_point(aes(y = Population), color = "dodgerblue") +
  labs(title = "Population Trend", x = "Year", y = "Population") +
  scale_y_continuous(labels = comma)

ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
  geom_line(aes(y = DACF)) +
  geom_point(aes(y = DACF), color = "dodgerblue") +
  labs(title = "DACF Trend", x = "Year", y = "IGF") +
  scale_y_continuous(labels = comma)

ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population, color = "Population")) +
  geom_point(aes(y = Population, color = "Population")) +
  geom_line(aes(y = DACF, color = "DACF")) +
  geom_point(aes(y = DACF, color = "DACF")) +
  labs(title = "Population vs. DACF Revenue", x = "Year", y = "Amount/Population", color = "Type") +
  scale_y_continuous(labels = scales::comma)

# Plotting Growth Rates
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population_Growth_Rate, color = "Population Growth")) +
  geom_point(aes(y = Population_Growth_Rate, color = "Population Growth")) +
  geom_line(aes(y = DACF_Growth_Rate, color = "DACF Growth")) +
  geom_point(aes(y = DACF_Growth_Rate, color = "DACF Growth")) +
  labs(title = "Population Growth vs. DACF Growth", x = "Year", y = "Growth Rate (%)", color = "Type")+
  geom_hline(yintercept = 0, linetype = "dashed", color = "red")

The histograms show an uneven distribution of population and DACF revenue. The trends plots show clear that the trend of DACF Revenue ( which experienced significant changes) moves in the same direction as the trend of Population( which had a stable rise).

1.2.1 Regression Analysis

mod2 <- lm(DACF ~ Population, data = Cleaned_TaMA_Data)
summary(mod2)
## 
## Call:
## lm(formula = DACF ~ Population, data = Cleaned_TaMA_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -934903 -463153 -325069  190219 2052059 
## 
## Coefficients:
##                 Estimate   Std. Error t value Pr(>|t|)   
## (Intercept) -3379580.055  1564725.131  -2.160  0.05908 . 
## Population        11.444        2.794   4.096  0.00269 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 864700 on 9 degrees of freedom
## Multiple R-squared:  0.6508, Adjusted R-squared:  0.612 
## F-statistic: 16.78 on 1 and 9 DF,  p-value: 0.002694
Cleaned_TaMA_Data %>%
  ggplot(aes(x = Population, y = DACF)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) + # Added confidence intervals
  labs(x = "Population", y = "DACF Revenue (Ghana Cedis)", title = "Linear Relationship between Population and DACF Revenue") +
  scale_y_continuous(labels = scales::comma)

There is a statistically significant positive relationship between population and DACF revenue performance patterns. As population increases, DACF tends to increase. Population explains only 65.08% of the variance in DACF.

  • Checking Regression Assumptions
 #Scatter Plot 
ggplot(Cleaned_TaMA_Data, aes(x = Population, y = DACF)) +
  geom_point() +
  labs(title = "Population vs. DACF Revenue",
       x = "Population", y = "DACF Revenue")

#  Residual 
ggplot(data = data.frame(residuals = residuals(mod2),
                        fitted = fitted(mod2)),
       aes(x = fitted, y = residuals)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs. Fitted",
       x = "Fitted Values", y = "Residuals")

ggplot(data = data.frame(residuals = residuals(mod2)),
       aes(x = residuals)) +
  geom_histogram(bins = 10, fill = "skyblue", color = "black") +
  labs(title = "Histogram of Residuals", x = "Residuals")

ggplot(data = data.frame(residuals = residuals(mod2)),
       aes(sample = residuals)) +
  stat_qq() +
  stat_qq_line() +
  labs(title = "Q-Q Plot of Residuals ")

shapiro.test(resid(mod2))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(mod2)
## W = 0.83717, p-value = 0.029
# Autocorrelation
dwtest(mod2)
## 
##  Durbin-Watson test
## 
## data:  mod2
## DW = 2.1657, p-value = 0.4609
## alternative hypothesis: true autocorrelation is greater than 0
# Homoscedasticity (Constant Variance of Residuals)

bptest(mod2)
## 
##  studentized Breusch-Pagan test
## 
## data:  mod2
## BP = 0.99812, df = 1, p-value = 0.3178
# Multicollinearity
#simple linear regression with one predictor(population), multicollinearity is not an issue.


# Multivariate Normality

#It is a simple linear regression with one predictor(population), multicollinearity therefore this is not an issue.

The scatter plot shows a positive but linear relationship. It shows that as population increases DACF revenue tends to increase as well. The histogram plot show a potential violation of the normality assumption and the test confirms it. The Durbin-Watson test revealed no autocorrelation, and the Breusch-Pagan test shows homoscedasticity.

  • Transforming the linear regression
#Transformed Models
log_mod2 <- lm(log(DACF) ~ log(Population), data = Cleaned_TaMA_Data)
summary(log_mod2 )
# 
# Call:
# lm(formula = log(DACF) ~ log(Population), data = Cleaned_TaMA_Data)
# 
# Residuals:
#      Min       1Q   Median       3Q      Max 
# -0.47510 -0.15097 -0.09072  0.09068  0.83725 
# 
# Coefficients:
#                 Estimate Std. Error t value Pr(>|t|)   
# (Intercept)     -24.0066     8.7533  -2.743  0.02275 * 
# log(Population)   2.9340     0.6627   4.427  0.00165 **
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# Residual standard error: 0.3766 on 9 degrees of freedom
# Multiple R-squared:  0.6853,  Adjusted R-squared:  0.6504 
# F-statistic:  19.6 on 1 and 9 DF,  p-value: 0.001654
sqrt_mod2 <- lm( sqrt(DACF)~sqrt(Population), data = Cleaned_TaMA_Data )  
summary(sqrt_mod2)
# 
# Call:
# lm(formula = sqrt(DACF) ~ sqrt(Population), data = Cleaned_TaMA_Data)
# 
# Residuals:
#     Min      1Q  Median      3Q     Max 
# -328.74 -131.72  -67.78   52.94  640.34 
# 
# Coefficients:
#                   Estimate Std. Error t value Pr(>|t|)   
# (Intercept)      -2529.334    981.892  -2.576  0.02989 * 
# sqrt(Population)     5.656      1.321   4.280  0.00205 **
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# Residual standard error: 276.3 on 9 degrees of freedom
# Multiple R-squared:  0.6706,  Adjusted R-squared:  0.634 
# F-statistic: 18.32 on 1 and 9 DF,  p-value: 0.002049
#  Scatter Plots (Transformed Data)
ggplot(Cleaned_TaMA_Data, aes(x = log(Population), y = log(DACF))) +
  geom_point() +
  geom_smooth(method = "lm")+
  labs(title = "Log(Population) vs. Log(DACF Revenue)",
       x = "Log(Population)", y = "Log(DACF Revenue)")

ggplot(Cleaned_TaMA_Data, aes(x = log(Population), y = log(DACF))) +
  geom_point() +
  geom_smooth(method = "lm")+
  labs(title = "Sqrt(Population) vs. Sqrt(DACF Revenue)",
       x = "Sqrt(Population)", y = "Sqrt(DACF Revenue)")

Both the log-log and square root transformations are statistically significant and have improved the model fit compared to the linear model. The log-log model is slightly better due to with higher R-squared.

# Function to perform diagnostic tests and plots
perform_diagnostics <- function(model, model_name) {
  # Residuals vs. Fitted
  plot1 <- ggplot(data = data.frame(residuals = residuals(model), fitted = fitted(model)),
                 aes(x = fitted, y = residuals)) +
    geom_point() +
    geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
    labs(title = paste("Residuals vs. Fitted (", model_name, ")"), x = "Fitted Values", y = "Residuals")

  # Histogram of Residuals
  plot2 <- ggplot(data = data.frame(residuals = residuals(model)), aes(x = residuals)) +
    geom_histogram(bins = 10, fill = "skyblue", color = "black") +
    labs(title = paste("Histogram of Residuals (", model_name, ")"), x = "Residuals")

  # Q-Q Plot of Residuals
  plot3 <- ggplot(data = data.frame(residuals = residuals(model)), aes(sample = residuals)) +
    geom_point(stat = "qq") +
    stat_qq_line() +
    labs(title = paste("Q-Q Plot of Residuals (", model_name, ")"))

  # Durbin-Watson Test
  dw_test <- dwtest(model)
  print(paste("Durbin-Watson Test (", model_name, "):"))
  print(dw_test)

  # Breusch-Pagan Test
  bp_test <- bptest(model)
  print(paste("Breusch-Pagan Test (", model_name, "):"))
  print(bp_test)

  # Print VIF (if applicable)
  if (length(coef(model)) > 2) { # Check for multiple predictors
    vif_result <- vif(model)
    print(paste("VIF (", model_name, "):"))
    print(vif_result)
  }

  # Arrange plots
  grid.arrange(plot1, plot2, plot3, nrow = 1)
}

# Perform diagnostics for each model
perform_diagnostics(mod2, "Linear Model")
## [1] "Durbin-Watson Test ( Linear Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 2.1657, p-value = 0.4609
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Linear Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 0.99812, df = 1, p-value = 0.3178

perform_diagnostics(log_mod2, "Log-Log Model")
## [1] "Durbin-Watson Test ( Log-Log Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 1.9478, p-value = 0.3146
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Log-Log Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 1.4224, df = 1, p-value = 0.233

perform_diagnostics(sqrt_mod2, "Square Root Model")
## [1] "Durbin-Watson Test ( Square Root Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 2.0621, p-value = 0.3893
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Square Root Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 1.2228, df = 1, p-value = 0.2688

shapiro.test(resid(mod2))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(mod2)
## W = 0.83717, p-value = 0.029
shapiro.test(resid(log_mod2))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(log_mod2)
## W = 0.88369, p-value = 0.1159
shapiro.test(resid(sqrt_mod2))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(sqrt_mod2)
## W = 0.86368, p-value = 0.06426

The diagnostic tests indicate that all the three models satisfy the assumptions of no autocorrelation and homoscedasticity. The Shapiro-Wilk normality tests confirm that the only linear model violates the normality assumption.

Therefore, from the regression analysis results all three models appear to be valid. The log-log model is the best with a higher R-squared and smaller residuals, it met all the assumption. It relationship between population and DACF revenue performance is better captured by the log model.

1.3 What is the relationship between population, recurerent and capital expenditure?

  • Descriptive Statistics
# Calculate descriptive statistics
desc_stats <- Cleaned_TaMA_Data %>%
  summarize(
    Population_mean = mean(Population),
    Population_sd = sd(Population),
    Population_min = min(Population),
    Population_max = max(Population),
    Capital_Expenditure_mean = mean(Capital_Expenditure),
    Capital_Expenditure_sd = sd(Capital_Expenditure),
    Capital_Expenditure_min = min(Capital_Expenditure),
    Capital_Expenditure_max = max(Capital_Expenditure),
    Recrrent_Expenditure_mean = mean(Recrrent_Expenditure),
    Recrrent_Expenditure_sd = sd(Recrrent_Expenditure),
    Recrrent_Expenditure_min = min(Recrrent_Expenditure),
    Recrrent_Expenditure_max = max(Recrrent_Expenditure)
  )


cat("
## Descriptive Statistics

| Statistic               | Population | Capital Expenditure | Recurrent Expenditure |
|------------------------|------------|---------------------|-----------------------|
| Mean                   |", format(desc_stats$Population_mean, big.mark = ",", digits = 2),
  "|", format(desc_stats$Capital_Expenditure_mean, big.mark = ",", digits = 2),
  "|", format(desc_stats$Recrrent_Expenditure_mean, big.mark = ",", digits = 2), "|
| Standard Deviation     |", format(desc_stats$Population_sd, big.mark = ",", digits = 2),
  "|", format(desc_stats$Capital_Expenditure_sd, big.mark = ",", digits = 2),
  "|", format(desc_stats$Recrrent_Expenditure_sd, big.mark = ",", digits = 2), "|
| Minimum                |", format(desc_stats$Population_min, big.mark = ",", digits = 2),
  "|", format(desc_stats$Capital_Expenditure_min, big.mark = ",", digits = 2),
  "|", format(desc_stats$Recrrent_Expenditure_min, big.mark = ",", digits = 2), "|
| Maximum                |", format(desc_stats$Population_max, big.mark = ",", digits = 2),
  "|", format(desc_stats$Capital_Expenditure_max, big.mark = ",", digits = 2),
  "|", format(desc_stats$Recrrent_Expenditure_max, big.mark = ",", digits = 2), "|
\n")
## 
## ## Descriptive Statistics
## 
## | Statistic               | Population | Capital Expenditure | Recurrent Expenditure |
## |------------------------|------------|---------------------|-----------------------|
## | Mean                   | 552,182 | 6,065,533 | 2,681,829 |
## | Standard Deviation     | 97,863 | 3,978,961 | 1,165,192 |
## | Minimum                | 411,000 | 3,061,667 | 864,055 |
## | Maximum                | 701,000 | 15,444,357 | 4,119,225 |
# Capital Expenditure Histogram
cap_hist <- ggplot(Cleaned_TaMA_Data, aes(x = Capital_Expenditure)) +
  geom_histogram(aes(y = ..density..), bins = 10, fill = "skyblue", color = "black") +
  geom_density(color = "red") +
  labs(title = "Distribution of Capital Expenditure", x = "Capital Expenditure (Ghana Cedis)", y = "Density") +
  scale_x_continuous(labels = comma) 

# Recurrent Expenditure Histogram
rec_hist <- ggplot(Cleaned_TaMA_Data, aes(x = Recrrent_Expenditure)) +
  geom_histogram(aes(y = ..density..), bins = 10, fill = "lightgreen", color = "black") +
  geom_density(color = "red") +
  labs(title = "Distribution of Recurrent Expenditure", x = "Recurrent Expenditure (Ghana Cedis)", y = "Density") +
  scale_x_continuous(labels = comma) 

# Population Histogram
pop_hist <- ggplot(Cleaned_TaMA_Data, aes(x = Population)) +
  geom_histogram(aes(y = ..density..), bins = 10, fill = "dodgerblue", color = "black") +
  geom_density(color = "red") +
  labs(title = "Distribution of Population", x = "Population", y = "Density") +
  scale_x_continuous(labels = comma) 

cap_hist

rec_hist

pop_hist

  • Trends
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population)) +
  geom_point(aes(y = Population), color = "dodgerblue") +
  labs(title = "Population Trend", x = "Year", y = "Population") +
  scale_y_continuous(labels = comma)

ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
  geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
  geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
  geom_line(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
  geom_point(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
  labs(title = " Expenditure Trends", x = "Year", y = "Amount", color = "Type") +
  theme(axis.title.y.right = element_text(vjust=2))

ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population, color = "Population")) +
  geom_point(aes(y = Population, color = "Population")) +
  geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
  geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
  geom_line(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
  geom_point(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
  labs(title = "Population and Expenditure Trends", x = "Year", y = "Amount", color = "Type") +
  scale_y_continuous(labels = comma, sec.axis = sec_axis(~., name = "Population")) +
  theme(axis.title.y.right = element_text(vjust=2))

ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
  geom_line(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
  geom_point(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
  geom_line(aes(y = Rec_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
  geom_point(aes(y = Rec_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
  labs(title = "Expenditure Per Capita Over Time", x = "Year", y = "Ghana Cedis Per Capita", color = "Type") +
  scale_y_continuous(labels = comma)

# Calculate Per Capita Values
Cleaned_TaMA_Data$Capital_Exp_Per_Capita <- Cleaned_TaMA_Data$Capital_Expenditure / Cleaned_TaMA_Data$Population

# Plotting Trends (Improved)
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population, color = "Population")) +
  geom_point(aes(y = Population, color = "Population")) +
  geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
  geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
  labs(title = "Population and Capital Expenditure Trends", x = "Year", y = "Amount", color = "Type") +
  scale_y_continuous(labels = comma, sec.axis = sec_axis(~., name = "Population")) +
  theme(axis.title.y.right = element_text(vjust=2))

# Per Capita Analysis 
average_capita <- mean(Cleaned_TaMA_Data$Capital_Exp_Per_Capita)

ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
  geom_line(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
  geom_point(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
  geom_hline(yintercept = average_capita, linetype = "dashed", color = "red")+
  labs(title = "Capital Expenditure Per Capita Over Time", x = "Year", y = "Ghana Cedis Per Capita", color = "Type") +
  scale_y_continuous(labels = comma) 

Cleaned_TaMA_Data$Recrrent_Exp_Per_Capita <- Cleaned_TaMA_Data$Recrrent_Expenditure / Cleaned_TaMA_Data$Population
average_rec_capita <- mean(Cleaned_TaMA_Data$Recrrent_Exp_Per_Capita)

ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
  geom_line(aes(y = Recrrent_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
  geom_point(aes(y = Recrrent_Exp_Per_Capita, color = "Recrrent Exp. Per Capita")) +
  geom_hline(yintercept = average_rec_capita, linetype = "dashed", color = "red") +
  labs(title = "Recurrent Expenditure Per Capita Over Time", x = "Year", y = "Ghana Cedis Per Capita", color = "Type") +
  scale_y_continuous(labels = comma)

ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population, color = "Population")) +
  geom_point(aes(y = Population, color = "Population")) +
  geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
  geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
  geom_line(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
  geom_point(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
  labs(title = "Population and Expenditure Trends", x = "Year", y = "Amount", color = "Type") +
  scale_y_continuous(labels = comma, sec.axis = sec_axis(~., name = "Population")) +
  theme(axis.title.y.right = element_text(vjust=2))

ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
  geom_line(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
  geom_point(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
  geom_line(aes(y = Recrrent_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
  geom_point(aes(y = Recrrent_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
  labs(title = "Expenditure Per Capita Over Time", x = "Year", y = "Ghana Cedis Per Capita", color = "Type") +
  scale_y_continuous(labels = comma)

1.3.1 Regression Results

mod3 <- lm(cbind(Capital_Expenditure, Recrrent_Expenditure) ~ Population, data = Cleaned_TaMA_Data)
summary(mod3)
## Response Capital_Expenditure :
## 
## Call:
## lm(formula = Capital_Expenditure ~ Population, data = Cleaned_TaMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -4708778 -1461251  -838424  -180257  8966045 
## 
## Coefficients:
##                Estimate  Std. Error t value Pr(>|t|)
## (Intercept) 12733679.01  7247229.12   1.757    0.113
## Population       -12.08       12.94  -0.933    0.375
## 
## Residual standard error: 4005000 on 9 degrees of freedom
## Multiple R-squared:  0.08822,    Adjusted R-squared:  -0.01309 
## F-statistic: 0.8708 on 1 and 9 DF,  p-value: 0.3751
## 
## 
## Response Recrrent_Expenditure :
## 
## Call:
## lm(formula = Recrrent_Expenditure ~ Population, data = Cleaned_TaMA_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -930985 -527287  105797  429788 1144183 
## 
## Coefficients:
##                 Estimate   Std. Error t value Pr(>|t|)   
## (Intercept) -2747965.898  1253148.192  -2.193  0.05599 . 
## Population         9.833        2.238   4.394  0.00173 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 692500 on 9 degrees of freedom
## Multiple R-squared:  0.6821, Adjusted R-squared:  0.6468 
## F-statistic: 19.31 on 1 and 9 DF,  p-value: 0.001735
mod_cap <- lm(Capital_Expenditure ~ Population, data = Cleaned_TaMA_Data)
summary(mod_cap)
## 
## Call:
## lm(formula = Capital_Expenditure ~ Population, data = Cleaned_TaMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -4708778 -1461251  -838424  -180257  8966045 
## 
## Coefficients:
##                Estimate  Std. Error t value Pr(>|t|)
## (Intercept) 12733679.01  7247229.12   1.757    0.113
## Population       -12.08       12.94  -0.933    0.375
## 
## Residual standard error: 4005000 on 9 degrees of freedom
## Multiple R-squared:  0.08822,    Adjusted R-squared:  -0.01309 
## F-statistic: 0.8708 on 1 and 9 DF,  p-value: 0.3751
mod_rec <- lm(Recrrent_Expenditure ~ Population, data = Cleaned_TaMA_Data)
summary(mod_rec)
## 
## Call:
## lm(formula = Recrrent_Expenditure ~ Population, data = Cleaned_TaMA_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -930985 -527287  105797  429788 1144183 
## 
## Coefficients:
##                 Estimate   Std. Error t value Pr(>|t|)   
## (Intercept) -2747965.898  1253148.192  -2.193  0.05599 . 
## Population         9.833        2.238   4.394  0.00173 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 692500 on 9 degrees of freedom
## Multiple R-squared:  0.6821, Adjusted R-squared:  0.6468 
## F-statistic: 19.31 on 1 and 9 DF,  p-value: 0.001735
Cleaned_TaMA_Data %>% 
  ggplot(aes(x = Population, y = Capital_Expenditure)) +
  geom_point()+
  geom_smooth(method = "lm", se = TRUE) + labs(x = "Population", y = "Capital Expenditure", title = "Linear Relationship Population and Capital Expenditure")+
   scale_y_continuous(labels = scales::comma)

Cleaned_TaMA_Data %>%
  ggplot(aes(x = Population, y = Recrrent_Expenditure)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(x = "Population", y = "Recurrent Expenditure", title = "Linear Relationship Population and Recurrent Expenditure") +
  scale_y_continuous(labels = scales::comma)

From the linear regression results there is a significant positive linear relationship between Population and Recrrent_Expenditure(p-value: 0.001735, R-squared: 0.6821) but non-significant relationship between Population and Capital_Expenditure (p-value: 0.3751, R-squared: 0.08822). For each unit increase in Population, Recrrent_Expenditure is estimated to increase by 9.833 Ghana Cedis.

  • Checking Regression Assumptions
# Diagnostic Function
perform_diagnostics <- function(model, model_name) {
  # Residuals vs. Fitted
  plot1 <- ggplot(data = data.frame(residuals = residuals(model), fitted = fitted(model)),
                 aes(x = fitted, y = residuals)) +
    geom_point() +
    geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
    labs(title = paste("Residuals vs. Fitted (", model_name, ")"), x = "Fitted Values", y = "Residuals")

  # Histogram of Residuals
  plot2 <- ggplot(data = data.frame(residuals = residuals(model)), aes(x = residuals)) +
    geom_histogram(bins = 10, fill = "skyblue", color = "black") +
    labs(title = paste("Histogram of Residuals (", model_name, ")"), x = "Residuals")

  # Q-Q Plot of Residuals
  plot3 <- ggplot(data = data.frame(residuals = residuals(model)), aes(sample = residuals)) +
    geom_point(stat = "qq") +
    stat_qq_line() +
    labs(title = paste("Q-Q Plot of Residuals (", model_name, ")"))

  # Durbin-Watson Test
  dw_test <- dwtest(model)
  print(paste("Durbin-Watson Test (", model_name, "):"))
  print(dw_test)

  # Breusch-Pagan Test
  bp_test <- bptest(model)
  print(paste("Breusch-Pagan Test (", model_name, "):"))
  print(bp_test)

  # Print VIF (if applicable)
  if (length(coef(model)) > 2) { # Check for multiple predictors
    vif_result <- vif(model)
    print(paste("VIF (", model_name, "):"))
    print(vif_result)
  }

  # Arrange plots
  grid.arrange(plot1, plot2, plot3, nrow = 1)
}

#  Perform Diagnostics
# Capital Expenditure
perform_diagnostics(mod_cap, "Capital Expenditure Model")
## [1] "Durbin-Watson Test ( Capital Expenditure Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 0.93098, p-value = 0.005943
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Capital Expenditure Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 1.3067, df = 1, p-value = 0.253

# Recurrent Expenditure
perform_diagnostics(mod_rec, "Recurrent Expenditure Model")
## [1] "Durbin-Watson Test ( Recurrent Expenditure Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 1.8214, p-value = 0.2394
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Recurrent Expenditure Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 0.17041, df = 1, p-value = 0.6797

From the above tests the Recurrent Expenditure Model met all all assumptions but the capital expenditure violates the autocorrelation regression assumption.

  • Transformations
# Log Transformation for Recurrent Expenditure 
log_rec_mod <- lm(log(Recrrent_Expenditure) ~ Population, data = Cleaned_TaMA_Data)
summary(log_rec_mod)
## 
## Call:
## lm(formula = log(Recrrent_Expenditure) ~ Population, data = Cleaned_TaMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.66977 -0.18757  0.06111  0.23307  0.41234 
## 
## Coefficients:
##                 Estimate   Std. Error t value      Pr(>|t|)    
## (Intercept) 12.492400159  0.594054473  21.029 0.00000000583 ***
## Population   0.000003997  0.000001061   3.768       0.00443 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3283 on 9 degrees of freedom
## Multiple R-squared:  0.6121, Adjusted R-squared:  0.569 
## F-statistic:  14.2 on 1 and 9 DF,  p-value: 0.004429
perform_diagnostics(log_rec_mod, "Log Recurrent Expenditure Model")
## [1] "Durbin-Watson Test ( Log Recurrent Expenditure Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 2.1873, p-value = 0.4761
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Log Recurrent Expenditure Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 1.473, df = 1, p-value = 0.2249

log_cap_mod <- lm(log(Capital_Expenditure) ~ Population, data = Cleaned_TaMA_Data)
summary(log_cap_mod)
## 
## Call:
## lm(formula = log(Capital_Expenditure) ~ Population, data = Cleaned_TaMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.79149 -0.24915 -0.03883  0.12082  1.02096 
## 
## Coefficients:
##                 Estimate   Std. Error t value     Pr(>|t|)    
## (Intercept) 16.471755961  0.961751754  17.127 0.0000000355 ***
## Population  -0.000001815  0.000001717  -1.057        0.318    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5315 on 9 degrees of freedom
## Multiple R-squared:  0.1104, Adjusted R-squared:  0.01151 
## F-statistic: 1.116 on 1 and 9 DF,  p-value: 0.3182
perform_diagnostics(log_cap_mod, "Log capital Expenditure Model")
## [1] "Durbin-Watson Test ( Log capital Expenditure Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 0.68088, p-value = 0.0006221
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Log capital Expenditure Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 2.448, df = 1, p-value = 0.1177

Cleaned_TaMA_Data$Ln_Population <- log(Cleaned_TaMA_Data$Population)
Cleaned_TaMA_Data$Ln_Capital_Expenditure <- log(Cleaned_TaMA_Data$Capital_Expenditure)



  

ggplot(Cleaned_TaMA_Data, aes(x = log(Population), y = log(Capital_Expenditure))) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE)+
  labs(title = "Log(Population) vs. Log(Capital Expenditure)",
       x = "Log(Population)", y = "Log(Capital Expenditure)")

  ggplot(Cleaned_TaMA_Data, aes(x = log(Population), y = log(Recrrent_Expenditure))) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "Log(Population) vs. Log(Recurrent Expenditure)",
       x = "Log(Population)", y = "Log(Recurrent Expenditure)")

#  Square root transformation for Capital Expenditure
sqrt_cap_mod <- lm(sqrt(Capital_Expenditure) ~ Population, data = Cleaned_TaMA_Data)
summary(sqrt_cap_mod)
## 
## Call:
## lm(formula = sqrt(Capital_Expenditure) ~ Population, data = Cleaned_TaMA_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -942.65 -289.22  -94.60   58.88 1483.28 
## 
## Coefficients:
##                Estimate  Std. Error t value Pr(>|t|)  
## (Intercept) 3636.405029 1283.303572   2.834   0.0196 *
## Population    -0.002297    0.002292  -1.002   0.3424  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 709.2 on 9 degrees of freedom
## Multiple R-squared:  0.1004, Adjusted R-squared:  0.0004577 
## F-statistic: 1.005 on 1 and 9 DF,  p-value: 0.3424
perform_diagnostics(sqrt_cap_mod, "Square root Capital Expenditure Model")
## [1] "Durbin-Watson Test ( Square root Capital Expenditure Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 0.79241, p-value = 0.001942
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Square root Capital Expenditure Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 1.8114, df = 1, p-value = 0.1783

From the transformations the recurrent expenditure model are still significant and met the assumptions but the capital expenditure have not.

  • Quadratic model
Cleaned_TaMA_Data$Recrrent_Expenditure_squared <- Cleaned_TaMA_Data$Recrrent_Expenditure^2

Cleaned_TaMA_Data$Capital_Expenditure_squared <- Cleaned_TaMA_Data$Capital_Expenditure^2

mod_quad <- lm(cbind(Capital_Expenditure, Recrrent_Expenditure) ~ Population + Population_Squared, data = Cleaned_TaMA_Data)

# View the summary
summary(mod_quad)
## Response Capital_Expenditure :
## 
## Call:
## lm(formula = Capital_Expenditure ~ Population + Population_Squared, 
##     data = Cleaned_TaMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3090996 -1920285 -1038866  1214987  6900062 
## 
## Coefficients:
##                             Estimate        Std. Error t value Pr(>|t|)  
## (Intercept)        -71641592.0657378  42931967.5280382  -1.669   0.1337  
## Population               300.7286485       157.8443571   1.905   0.0932 .
## Population_Squared        -0.0002817         0.0001418  -1.987   0.0822 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3476000 on 8 degrees of freedom
## Multiple R-squared:  0.3895, Adjusted R-squared:  0.2368 
## F-statistic: 2.552 on 2 and 8 DF,  p-value: 0.139
## 
## 
## Response Recrrent_Expenditure :
## 
## Call:
## lm(formula = Recrrent_Expenditure ~ Population + Population_Squared, 
##     data = Cleaned_TaMA_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -930947 -530205  110649  419965 1153130 
## 
## Coefficients:
##                              Estimate         Std. Error t value Pr(>|t|)
## (Intercept)        -2412849.139164112  9071144.591820087  -0.266    0.797
## Population                8.590966910       33.351114993   0.258    0.803
## Population_Squared        0.000001119        0.000029960   0.037    0.971
## 
## Residual standard error: 734500 on 8 degrees of freedom
## Multiple R-squared:  0.6822, Adjusted R-squared:  0.6027 
## F-statistic: 8.585 on 2 and 8 DF,  p-value: 0.01021
#  Scatter Plots (Transformed Data)
ggplot(Cleaned_TaMA_Data, aes(x = Population, y = Capital_Expenditure)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) +
  labs(x = "Population", y = "Capital Expenditure (Ghana Cedis)", title = "Quadratic Relationship between Population and Capital Expenditure") +
  scale_y_continuous(labels = comma)

ggplot(Cleaned_TaMA_Data, aes(x = Population, y = Recrrent_Expenditure)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) +
  labs(x = "Population", y = "Recurrent Expenditure (Ghana Cedis)", title = "Quadratic Relationship between Population and Recurrent Expenditure") +
  scale_y_continuous(labels = comma)

Quadratic models show improvement of the relationship between population and capital expenditure but the relationship is still non-significant.

There from the regression analysis above the relationship between population and recurrent expenditure is positive linear and significant but the capital expenditure is non-linear and non-significant. The simple linear regression is the best fit model for recurrent expenditure and population.

1.4 What is the relationship between revenue growth and infrastructure delivery (Model)

Using total revenue growth rate and infrastructure delivery (capital expenditure per capita).

# Descriptive statistics
Cleaned_TaMA_Data %>% skim(Capital_Exp_Per_Capita)
Data summary
Name Piped data
Number of rows 11
Number of columns 85
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Capital_Exp_Per_Capita 0 1 11.56 8.18 4.6 6.55 8.36 12.8 29.82 ▇▂▁▁▁
Cleaned_TaMA_Data %>% skim(TtRev_Growth_Rate)
Data summary
Name Piped data
Number of rows 11
Number of columns 85
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
TtRev_Growth_Rate 1 0.91 6.83 23.53 -42.14 4.41 8.92 16.84 39.13 ▂▂▂▇▃
# Histograms
ggplot(Cleaned_TaMA_Data, aes(x = Capital_Exp_Per_Capita)) +
  geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
  labs(title = "Distribution of Capital expenditure per capita", x = "Capital expenditure per capita") +
  scale_x_continuous(labels = comma)

ggplot(Cleaned_TaMA_Data, aes(x = TtRev_Growth_Rate)) +
  geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
  labs(title = "Distribution of Total Revenue Growth Rate", x = "Total revenue growth rate") +
  scale_x_continuous(labels = percent)

# Plotting Trends 

ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
  geom_line(aes(y = TtRev_Growth_Rate, color = "Total Revenue Growth Rate")) +
  geom_point(aes(y = TtRev_Growth_Rate, color = "Total Revenue Growth Rate")) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  geom_line(aes(y = Capital_Exp_Per_Capita, color = "Capital Expenditure Per Capita")) +
  geom_point(aes(y = Capital_Exp_Per_Capita, color = "Capital Expenditure Per Capita")) +
  labs(
    title = "Total Revenue Growth Rate vs. Capital Expenditure Per Capita",
    x = "Year",
    y = "Total Revenue Growth Rate (%)"  
  ) +
  scale_y_continuous(
    labels = percent_format(scale = 1),  
    sec.axis = sec_axis(~., name = "Capital Expenditure Per Capita")
  ) +
  scale_color_manual(
    values = c("Total Revenue Growth Rate" = "lightseagreen", "Capital Expenditure Per Capita" = "indianred"),
    name = "Type"
  ) +
  theme(axis.title.y.right = element_text(vjust = 2))

The histograms show an uneven distribution of Capital expenditure per capita.The trends plots show clear that the trend of Total revenue growth rate ( which experienced significant changes) is not directly linked to the trend of Capital expenditure per capita( which remained stable).

1.4.1 Regression results

mod5 <- lm(Capital_Exp_Per_Capita ~ TtRev_Growth_Rate, data = Cleaned_TaMA_Data)
summary(mod5)
## 
## Call:
## lm(formula = Capital_Exp_Per_Capita ~ TtRev_Growth_Rate, data = Cleaned_TaMA_Data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -7.612 -5.977 -2.603  2.961 17.699 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)   
## (Intercept)       11.33810    2.88082   3.936  0.00432 **
## TtRev_Growth_Rate  0.09282    0.12342   0.752  0.47353   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.712 on 8 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.06604,    Adjusted R-squared:  -0.05071 
## F-statistic: 0.5657 on 1 and 8 DF,  p-value: 0.4735
ggplot(Cleaned_TaMA_Data, aes(x = TtRev_Growth_Rate, y = Capital_Exp_Per_Capita)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE)+
  labs(title = "Revenue Growth vs. Capital Expenditure (Per Capita)",
       x = "Total Revenue Growth Rate (%)",
       y = "Capital Expenditure Per Capita")

The regression result show there no statistically significant relationship between total revenue growth rate and infrastructure delivery (capital expenditure per capita) with p-value (0.3194) is greater than 0.05 significance level. This means that changes in revenue growth do not significantly predict changes in capital expenditure per capita in this model. The R-squared (0.1235) indicates only 12.35% of the variation in capital expenditure per capita can be explained by revenue growth (total revenue growth rate)

1.5 What is the relationship between expenditure growth and infrastructure delivery?

  • Regression results using expenditure growth (Expenditure_Growth) and infrastructure delivery (capital expenditure per capita).
Cleaned_TaMA_Data$Expenditure_Growth <- c(NA, diff(Cleaned_TaMA_Data$Total_Expenditure) / Cleaned_TaMA_Data$Total_Expenditure[-nrow(Cleaned_TaMA_Data)]) * 100

mod6 <- lm(Capital_Exp_Per_Capita ~ Expenditure_Growth, data = Cleaned_TaMA_Data)
  summary(mod6)
## 
## Call:
## lm(formula = Capital_Exp_Per_Capita ~ Expenditure_Growth, data = Cleaned_TaMA_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.5176 -4.3263 -2.9899  0.8945 16.4278 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)   
## (Intercept)         9.88510    2.52264   3.919  0.00443 **
## Expenditure_Growth  0.14099    0.06878   2.050  0.07452 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.299 on 8 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.3444, Adjusted R-squared:  0.2624 
## F-statistic: 4.202 on 1 and 8 DF,  p-value: 0.07452
  ggplot(Cleaned_TaMA_Data, aes(x = Expenditure_Growth, y = Capital_Exp_Per_Capita)) +
    geom_point() + geom_smooth(method = "lm", se = TRUE)+
    labs(title = "Expenditure Growth vs. Capital Expenditure (Per Capita)",
         x = "Expenditure Growth Rate (%)",
         y = "Capital Expenditure Per Capita")

From the linear regression results there is no statistically significant relationship.

2 SHEET 2

2.1 What is the relationship between allocative and funding decision-making and revenue patterns?

# no variables

2.2 What is the relationship between allocative decision-making and expenditure patterns?

  • No direct variables are available on this, some descriptive statistics of closely related are below
# Expenditure Composition:
Cleaned_TaMA_Data$CapExp_Pct <- (Cleaned_TaMA_Data$Capital_Expenditure / Cleaned_TaMA_Data$Total_Expenditure) 
Cleaned_TaMA_Data$CapExp_Rev_Ratio <- (Cleaned_TaMA_Data$Capital_Expenditure / Cleaned_TaMA_Data$Total_Revenue)



# Expenditure Composition 
ggplot(Cleaned_TaMA_Data, aes(x = Year, y = CapExp_Pct)) +
  geom_bar(stat = "identity", fill = "dodgerblue") +
  geom_point()+
  labs(title = "Capital Expenditure as Percentage of Total Expenditure",
       x = "Year",
       y = "Percentage") +
  scale_y_continuous(labels = percent_format(accuracy = 1))

# Trends of Revenue and Expenditure over the years.

ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
  geom_line(aes(y = Total_Revenue, color = "Total Revenue")) +
  geom_point(aes(y = Total_Revenue)) +  # Added aes(y = Total_Revenue)
  geom_line(aes(y = Total_Expenditure, color = "Total Expenditure")) +
  geom_point(aes(y = Total_Expenditure)) +  # Added aes(y = Total_Expenditure)
  labs(title = "Revenue and Expenditure Trends Over Years",
       x = "Year",
       y = "Amount (Ghana Cedis)", color = "Type") +
  scale_color_manual(values = c("Total Revenue" = "blue", "Total Expenditure" = "red")) +
  scale_y_continuous(labels = comma) 

ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
  geom_line(aes(y = Total_Revenue, color = "Total Revenue"), size = 1) +
  geom_line(aes(y = IGF, color = "IGF"), size = 1) +
  geom_line(aes(y = DACF, color = "DACF"), size = 1) +
  geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure"), size = 1) +
    geom_line(aes(y = Recrrent_Expenditure , color = "Recurrent Expenditure"), size = 1) +
  geom_line(aes(y = Total_Expenditure, color = "Total Expenditure"), size = 1) +
  geom_line(aes(y = Others_Sources, color = "Other Sources"), size = 1) +
  labs(
    title = "Revenue and Expenditure Trends",
    x = "Year",
    y = "Amount (Ghana Cedis)",
    color = "Type"
  ) +
  scale_color_manual(
    values = c(
      "Total Revenue" = "blue",
      "Other Sources" = "skyblue",
      "IGF" = "green",
      "DACF" = "darkgray",
      "Capital Expenditure" = "purple",
      "Total Expenditure" = "red",
      "Recurrent Expenditure" = "yellow"
    )
  ) +
  scale_y_continuous(labels = scales::comma) +
  theme(
    legend.position = "right", 
    legend.title = element_text(face = "bold"), 
    plot.title = element_text(hjust = 0.5, face = "bold") 
  )

# IGF to Total Expenditure Ratio 
ggplot(Cleaned_TaMA_Data, aes(x = Year, y = IGF_TE)) +
  geom_line(color = "steelblue", size = 1) +
  geom_point(size = 2.5) +
  labs(
    title = "IGF to Total Expenditure Ratio Over Years",
    x = "Year",
    y = "Ratio (IGF/Total Expenditure)"
  ) +
  scale_y_continuous(labels = percent_format(accuracy = 1)) 

# CapExp_Rev_Ratio plot.
ggplot(Cleaned_TaMA_Data, aes(x = Year, y = CapExp_Rev_Ratio)) +
  geom_line(color = "steelblue", size = 1) +
  geom_point(size = 2.5) +
  labs(
    title = "Capital Expenditure to Total Revenue Ratio Over Years",
    x = "Year",
    y = "Ratio (Capital Expenditure/Total Revenue)"
  ) +
  scale_y_continuous(labels = comma) 

cor.test(Cleaned_TaMA_Data$Total_Expenditure, Cleaned_TaMA_Data$Total_Revenue)
## 
##  Pearson's product-moment correlation
## 
## data:  Cleaned_TaMA_Data$Total_Expenditure and Cleaned_TaMA_Data$Total_Revenue
## t = 9.1392, df = 9, p-value = 0.00000753
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8144349 0.9872874
## sample estimates:
##       cor 
## 0.9501201

In the above plots, the Capital Expenditure as Percentage of Total Expenditure shows a slightly high capital investment with peak around 2014, followed by a sustained decline after 2016. Also, there is strong correlation between Total Revenue and Total Expenditure, with both peaking around 2016 and fall afterwards.

2.3 What is the relationship between population trend, service delivery and revenue and expenditure patterns?

# Revenue Per Capita
Cleaned_TaMA_Data$Total_Revenue_Per_Capita <- Cleaned_TaMA_Data$Total_Revenue / Cleaned_TaMA_Data$Population
Cleaned_TaMA_Data$IGF_Per_Capita <- Cleaned_TaMA_Data$IGF / Cleaned_TaMA_Data$Population
Cleaned_TaMA_Data$DACF_Per_Capita <- Cleaned_TaMA_Data$DACF / Cleaned_TaMA_Data$Population

# Time Series Plots (Improved)

# Total Revenue and Expenditure Trends
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
  geom_line(aes(y = Total_Revenue, color = "Total Revenue"), size = 1) +
  geom_point(aes(y = Total_Revenue, color = "Total Revenue")) +
  geom_line(aes(y = IGF, color = "IGF"), size = 1) +
  geom_point(aes(y = IGF, color = "IGF")) +
  geom_line(aes(y = DACF, color = "DACF"), size = 1) +
  geom_point(aes(y = DACF, color = "DACF")) +
  geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure"), size = 1) +
  geom_line(aes(y = Recrrent_Expenditure , color = "Recurrent Expenditure"), size = 1) +
  geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
  geom_line(aes(y = Total_Expenditure, color = "Total Expenditure"), size = 1) +
  geom_point(aes(y = Total_Expenditure, color = "Total Expenditure")) +
  geom_line(aes(y = Others_Sources, color = "Other Sources"), size = 1) +
  geom_point(aes(y = Others_Sources, color = "Other Sources")) +
  labs(
    title = "Revenue and Expenditure Trends Over Years",
    x = "Year",
    y = "Amount (Ghana Cedis)",
    color = "Type"
  ) +
  scale_color_manual(
    values = c(
      "Total Revenue" = "blue",
      "Other Sources" = "skyblue",
      "IGF" = "green",
      "DACF" = "darkgray",
      "Capital Expenditure" = "purple",
      "Total Expenditure" = "red",
      "Recurrent Expenditure" = "yellow"
    )
  ) +
  scale_y_continuous(labels = comma) +
  theme(
    legend.position = "right",
    legend.title = element_text(face = "bold"),
    plot.title = element_text(hjust = 0.5, face = "bold")
  )

# Population Trend
ggplot(Cleaned_TaMA_Data, aes(x = Year, y = Population)) +
  geom_line(color = "steelblue", size = 1) +
  geom_point(size = 2.5) +
  labs(
    title = "Population Trend Over Years",
    x = "Year",
    y = "Population"
  ) +
  scale_y_continuous(labels = comma) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.title = element_text(face = "bold")
  )

# IGF to Total Expenditure Ratio
ggplot(Cleaned_TaMA_Data, aes(x = Year, y = IGF_TE)) +
  geom_line(color = "steelblue", size = 1) +
  geom_point(size = 2.5) +
  labs(
    title = "IGF to Total Expenditure Ratio Over Years",
    x = "Year",
    y = "Ratio (IGF/Total Expenditure)"
  ) +
  scale_y_continuous(labels = percent_format(accuracy = 1)) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.title = element_text(face = "bold")
  )

# Per capita plot
ggplot(Cleaned_TaMA_Data, aes(x = Year)) +
  geom_line(aes(y = Total_Revenue_Per_Capita, color = "Total Revenue Per Capita")) +
  geom_point(aes(y = Total_Revenue_Per_Capita, color = "Total Revenue Per Capita")) +
  geom_line(aes(y = IGF_Per_Capita, color = "IGF Per Capita")) +
  geom_point(aes(y = IGF_Per_Capita, color = "IGF Per Capita")) +
  geom_line(aes(y = DACF_Per_Capita, color = "DACF Per Capita")) +
  geom_point(aes(y = DACF_Per_Capita, color = "DACF Per Capita")) +
  labs(title = "Revenue Per Capita trends", x = "Year", y = "Amount (Ghana Cedis)", color = "Type") +
  scale_y_continuous(labels = comma) 

cor_matrix <- cor(Cleaned_TaMA_Data[, c("Population", "Total_Revenue", "Total_Expenditure", "IGF_TE", "CapExp_Pct", "IGF")], use = "complete.obs")
print(cor_matrix)
##                   Population Total_Revenue Total_Expenditure     IGF_TE
## Population         1.0000000     0.7194963         0.6545468  0.2825390
## Total_Revenue      0.7194963     1.0000000         0.9501201 -0.2968282
## Total_Expenditure  0.6545468     0.9501201         1.0000000 -0.4695060
## IGF_TE             0.2825390    -0.2968282        -0.4695060  1.0000000
## CapExp_Pct        -0.8345614    -0.3341015        -0.2761764 -0.5350044
## IGF                0.9508158     0.7059292         0.5921264  0.4213178
##                   CapExp_Pct        IGF
## Population        -0.8345614  0.9508158
## Total_Revenue     -0.3341015  0.7059292
## Total_Expenditure -0.2761764  0.5921264
## IGF_TE            -0.5350044  0.4213178
## CapExp_Pct         1.0000000 -0.7987978
## IGF               -0.7987978  1.0000000
corrplot(cor_matrix, main = "Correlation matrix of population and expenditure patterns")

In the above there is a strong positive correlation between total revenue and total expenditure and also between IGF.

2.3.1 Regression Analysis

# Total Revenue vs Population
model_revenue_pop <- lm(Total_Revenue ~ Population, data = Cleaned_TaMA_Data)
summary(model_revenue_pop)
## 
## Call:
## lm(formula = Total_Revenue ~ Population, data = Cleaned_TaMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3868956 -1745651 -1182929  1274374  5837028 
## 
## Coefficients:
##                Estimate  Std. Error t value Pr(>|t|)  
## (Intercept) -3297051.78  6117612.55  -0.539   0.6030  
## Population        33.95       10.92   3.108   0.0126 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3381000 on 9 degrees of freedom
## Multiple R-squared:  0.5177, Adjusted R-squared:  0.4641 
## F-statistic:  9.66 on 1 and 9 DF,  p-value: 0.01256
# Total Expenditure vs Population
model_expenditure_pop <- lm(Total_Expenditure ~ Population, data = Cleaned_TaMA_Data)
summary(model_expenditure_pop)
## 
## Call:
## lm(formula = Total_Expenditure ~ Population, data = Cleaned_TaMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3711913 -2764234  -512930   318938  8909224 
## 
## Coefficients:
##                Estimate  Std. Error t value Pr(>|t|)  
## (Intercept) -4000822.24  7428256.85  -0.539   0.6032  
## Population        34.45       13.26   2.597   0.0289 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4105000 on 9 degrees of freedom
## Multiple R-squared:  0.4284, Adjusted R-squared:  0.3649 
## F-statistic: 6.746 on 1 and 9 DF,  p-value: 0.02886
# Capital Expenditure vs Total Revenue and IGF_TE
model_capital_rev_igf <- lm(Capital_Expenditure ~ Total_Revenue + IGF_TE, data = Cleaned_TaMA_Data)
summary(model_capital_rev_igf)
## 
## Call:
## lm(formula = Capital_Expenditure ~ Total_Revenue + IGF_TE, data = Cleaned_TaMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2776838 -1487355    16342  1565100  2945904 
## 
## Coefficients:
##                      Estimate      Std. Error t value Pr(>|t|)   
## (Intercept)     17265329.1768    4379246.4356   3.943  0.00428 **
## Total_Revenue          0.1121          0.1589   0.706  0.50040   
## IGF_TE        -103977254.9573   23393596.4180  -4.445  0.00215 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2216000 on 8 degrees of freedom
## Multiple R-squared:  0.752,  Adjusted R-squared:  0.6899 
## F-statistic: 12.13 on 2 and 8 DF,  p-value: 0.003786
# IGF_TE vs Population and Total Revenue
model_igfte_pop_rev <- lm(IGF_TE ~ Population + Total_Revenue, data = Cleaned_TaMA_Data)
summary(model_igfte_pop_rev)
## 
## Call:
## lm(formula = IGF_TE ~ Population + Total_Revenue, data = Cleaned_TaMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.02625 -0.01534 -0.00533  0.01590  0.03440 
## 
## Coefficients:
##                      Estimate      Std. Error t value Pr(>|t|)  
## (Intercept)    0.051157749331  0.040855907353   1.252   0.2459  
## Population     0.000000329638  0.000000103393   3.188   0.0128 *
## Total_Revenue -0.000000007042  0.000000002191  -3.214   0.0124 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02222 on 8 degrees of freedom
## Multiple R-squared:  0.5984, Adjusted R-squared:  0.498 
## F-statistic:  5.96 on 2 and 8 DF,  p-value: 0.02602
#  Visualizations

# Scatter plot: Total Revenue vs Population
ggplot(Cleaned_TaMA_Data, aes(x = Population, y = Total_Revenue)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "Total Revenue vs Population", x = "Population", y = "Total Revenue") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = comma)

# Scatter plot: Total Expenditure vs Population
ggplot(Cleaned_TaMA_Data, aes(x = Population, y = Total_Expenditure)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "Total Expenditure vs Population", x = "Population", y = "Total Expenditure") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = comma)

# Scatter plot: Capital Expenditure vs Total Revenue
ggplot(Cleaned_TaMA_Data, aes(x = Total_Revenue, y = Capital_Expenditure)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "Capital Expenditure vs Total Revenue", x = "Total Revenue", y = "Capital Expenditure") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = comma)

# Scatter plot: IGF_TE vs Population
ggplot(Cleaned_TaMA_Data, aes(x = Population, y = IGF_TE)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "IGF_TE vs Population", x = "Population", y = "IGF_TE") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = percent_format(accuracy = 1))

ggplot(Cleaned_TaMA_Data, aes(x = Total_Revenue, y = IGF_TE)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "IGF_TE vs Total Revenue", x = "Total Revenue", y = "IGF_TE") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = percent_format(accuracy = 1))

In the regression results above, we found a significant linear relationship between between Total Revenue and Population, Total Expenditure and Population, and Capital Expenditure, Total Revenue, and between IGF_TE vs Population and Total Revenue.

2.4 What is the relationship between service delivery and revenue and expenditure patterns?

# no variables

2.5 SHEET 3